Data Clustering Validation using Constraints
نویسندگان
چکیده
Much attention is being given to the incorporation of constraints into data clustering, mainly expressed in the form of must-link and cannot-link constraints between pairs of domain objects. However, its inclusion in the important clustering validation process was so far disregarded. In this work, we integrate the use of constraints in clustering validation. We propose three approaches to accomplish it: produce a weighted validity score considering a traditional validity index and the constraint satisfaction ratio; learn a new distance function or feature space representation which better suits the constraints, and use it with a validation index; and a combination of the previous. Experimental results in 14 synthetic and real data sets have shown that including the information provided by the constraints increases the performance of the clustering validation process in selecting the best number of clusters.
منابع مشابه
Generating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms
UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...
متن کاملOn Data Clustering Analysis: Scalability, Constraints, and Validation
Clustering is the problem of grouping data based on similarity. While this problem has attracted the attention of many researchers for many years, we are witnessing a resurgence of interest in new clustering techniques. In this paper we discuss some very recent clustering approaches and recount our experience with some of these algorithms. We also present the problem of clustering in the presen...
متن کاملTowards Constrained Co-clustering in Ordered 0/1 Data Sets
Within 0/1 data, co-clustering provides a collection of biclusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use user-defined constraints to capture subjective interestingness aspects and thus to improve bi-cluster relevancy. We consider the case of 0/1 data where at least one dimension is ordered, e...
متن کاملPrediction-Based Portfolio Optimization Model for Iran’s Oil Dependent Stocks Using Data Mining Methods
This study applied a prediction-based portfolio optimization model to explore the results of portfolio predicament in the Tehran Stock Exchange. To this aim, first, the data mining approach was used to predict the petroleum products and chemical industry using clustering stock market data. Then, some effective factors, such as crude oil price, exchange rate, global interest rate, gold price, an...
متن کاملSetting Priors and Enforcing Constraints on Matches for Nonlinear Registration of Meshes
We show that a simple probabilistic modelling of the registration problem for surfaces allows to solve it by using standard clustering techniques. In this framework, point-to-point correspondences are hypothesized between the two free-form surfaces, and we show how to specify priors and to enforce global constraints on these matches with only minor changes to the optimisation algorithm. The pur...
متن کامل